# MAP

# Clean Tagged Sequences Script

## Description

This Python script, `MAP_to_fasta.py`, processes a CSV file containing tagged sequences and removes the tags to generate a clean sequence output. The script is designed to handle sequences where tags are enclosed in curly braces `{}`.  It can export the cleaned sequences in either CSV or FASTA format.

## Installation

1.  **Python:** Ensure you have Python 3.6 or later installed. You can check your Python version by running `python --version` or `python3 --version` in your terminal.

2.  **Pandas:** The script uses the Pandas library for CSV file handling. If you don't have it installed, you can install it using pip:

    ```bash
    pip install pandas
    ```

## Usage

To use the script, open your terminal and run the following command:

```bash
python MAP_to_fasta.py -i <input_csv_file> -f <output_format> [options]


Arguments
-i, --input (required): The path to the input CSV file. This file should contain the tagged sequences.
-f, --format (required): The desired output format. Use f to output a FASTA file, or c to output a CSV file.
-o, --output (optional): The base name for the output file(s). If not provided, the default name Cleaned_Sequences will be used.
-org, --org (optional): The organism name to include in the FASTA header. Defaults to "undefine".
-func, --func (optional): The function description to include in the FASTA header. Defaults to "unknown".
--prefix (optional): The header prefix for the FASTA entries. Defaults to "Sample".
Input CSV File
The input CSV file should contain the tagged sequences. The script will automatically detect if the CSV file has a header.
If a header is present, the script will search for the first column containing sequences with tags in the format {tag}.
If no header is present, the script will assume that the first column (index 0) contains the tagged sequences and will name the column 'MAP_seq'.
Output
The script generates one or two files, depending on the specified output format:
CSV Output (-f c): A CSV file named <output_name>.csv will be created. This file will contain all the original columns from the input CSV, plus a new column named "Clean Sequence" containing the cleaned sequences.
FASTA Output (-f f): A FASTA file named <output_name>.fasta will be created. This file will contain the cleaned sequences in FASTA format. Each sequence will have a header line with the following format:
>{prefix}_{sequence_number} {org:{organism_name}} {func:{function_description}}

For example:
>Sample_1 {org:Human} {func:Enzyme}
ACGTACGT
>Sample_2 {org:Mouse} {func:Transporter}
GATTACA


Examples
Clean sequences and save to a CSV file named cleaned_data.csv:
python MAP_to_fasta.py -i input.csv -f c -o cleaned_data


Clean sequences and save to a FASTA file named sequences.fasta, specifying the organism and function:
python MAP_to_fasta.py -i input.csv -f f -o sequences -org Human -func Enzyme


Clean sequences from a CSV file without a header, and save to a FASTA file with the default output name:
python MAP_to_fasta.py -i no_header.csv -f f


Error Handling
If no column with tags in the format {tag} is found in the input CSV, the script will raise a ValueError and exit.
If an error occurs during sequence cleaning, the script will replace the sequence with "ERROR" in the output.
Help
To see a description of all the arguments, use the help argument:
python MAP_to_fasta.py -h


